Goto

Collaborating Authors

 Knowledge Discovery


Overview of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Interactive AI Magazine

IC3K 2025 (17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management) received 163 paper submissions from 40 countries. To evaluate each submission, a double-blind paper review was performed by the Program Committee. After a stringent selection process, 31 papers were published and presented as full papers, i.e. completed work (12 pages/25' oral presentation), 81 papers were accepted as short papers (54 as oral presentation). The organizing committee included the IC3K Conference Chairs: Ricardo da Silva Torres, Artificial Intelligence Group, Wageningen University & Research, Netherlands and Jorge Bernardino, Polytechnic University of Coimbra, Portugal, and the IC3K 2025 Program Chairs: Le Gruenwald, University of Oklahoma, School of Computer Science, United States, Frans Coenen, University of Liverpool, United Kingdom, Jesualdo Tomás Fernández-Breis, University of Murcia, Spain, Lars Nolle, Jade University of Applied Sciences, Germany, Elio Masciari, University of Napoli Federico II, Italy and David Aveiro, University of Madeira, NOVA-LINCS and ARDITI, Portugal. At the closing session, the conference acknowledged a few papers that were considered excellent in their class, presenting a "Best Paper Award", "Best Student Paper Award", and "Best Poster Award" for each of the co-located conferences.


Unveiling Interesting Insights: Monte Carlo Tree Search for Knowledge Discovery

Totis, Pietro, Pozanco, Alberto, Borrajo, Daniel

arXiv.org Artificial Intelligence

Organizations are increasingly focused on leveraging data from their processes to gain insights and drive decision-making. However, converting this data into actionable knowledge remains a difficult and time-consuming task. There is often a gap between the volume of data collected and the ability to process and understand it, which automated knowledge discovery aims to fill. Automated knowledge discovery involves complex open problems, including effectively navigating data, building models to extract implicit relationships, and considering subjective goals and knowledge. In this paper, we introduce a novel method for Automated Insights and Data Exploration (AIDE), that serves as a robust foundation for tackling these challenges through the use of Monte Carlo Tree Search (MCTS). We evaluate AIDE using both real-world and synthetic data, demonstrating its effectiveness in identifying data transformations and models that uncover interesting data patterns. Among its strengths, AIDE's MCTS-based framework offers significant extensibility, allowing for future integration of additional pattern extraction strategies and domain knowledge. This makes AIDE a valuable step towards developing a comprehensive solution for automated knowledge discovery.


Towards AI-Driven Policing: Interdisciplinary Knowledge Discovery from Police Body-Worn Camera Footage

Srbinovska, Anita, Srbinovska, Angela, Senthil, Vivek, Martin, Adrian, McCluskey, John, Bateman, Jonathan, Fokoué, Ernest

arXiv.org Artificial Intelligence

This paper proposes a novel interdisciplinary framework for analyzing police body-worn camera (BWC) footage from the Rochester Police Department (RPD) using advanced artificial intelligence (AI) and statistical machine learning (ML) techniques. Our goal is to detect, classify, and analyze patterns of interaction between police officers and civilians to identify key behavioral dynamics, such as respect, disrespect, escalation, and de-escalation. We apply multimodal data analysis by integrating image, audio, and natural language processing (NLP) techniques to extract meaningful insights from BWC footage. The framework incorporates speaker separation, transcription, and large language models (LLMs) to produce structured, interpretable summaries of police-civilian encounters. We also employ a custom evaluation pipeline to assess transcription quality and behavior detection accuracy in high-stakes, real-world policing scenarios. Our methodology, computational techniques, and findings outline a practical approach for law enforcement review, training, and accountability processes while advancing the frontiers of knowledge discovery from complex police BWC data.


"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Gu, Jihao, Wang, Yingyao, Bu, Pi, Wang, Chen, Wang, Ziming, Song, Tengtao, Wei, Donglai, Yuan, Jiale, Zhao, Yingxiu, He, Yancheng, Li, Shilong, Liu, Jiaheng, Cao, Meng, Song, Jun, Tan, Yingshui, Li, Xiang, Su, Wenbo, Zheng, Zhicheng, Zhu, Xiaoyong, Zheng, Bo

arXiv.org Artificial Intelligence

The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers. Moreover, we contribute a rigorous data construction pipeline and decouple the visual factuality into two parts: seeing the world (i.e., object recognition) and discovering knowledge. This decoupling allows us to analyze the capability boundaries and execution mechanisms of LVLMs. Subsequently, we evaluate 34 advanced open-source and closed-source models, revealing critical performance gaps within this field.


Overview of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Interactive AI Magazine

IC3K 2024 (16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management) received 175 paper submissions from 47 countries. To evaluate each submission, a double‐blind paper review was performed by the Program Committee. After a stringent selection process, 37 papers were published and presented as full papers, i.e. completed work (12 The organizing committee included the IC3K Conference Chair: Jorge Bernardino, Polytechnic University of Coimbra, Portugal and the IC3K 2024 Program Chairs: David Aveiro, University of Madeira, NOVA- LINCS and ARDITI, Portugal, Antonella Poggi, Università di Roma "La Sapienza", Italy, Ana Fred, Instituto de Telecomunicações and Instituto Superior Técnico (University of Lisbon), Portugal, Le Gruenwald, University of Oklahoma, School of Computer Science, United States, Elio Masciari, University of Napoli Federico II, Italy and Frans Coenen, University of Liverpool, United Kingdom. At the closing session, the conference acknowledged a few papers that were considered excellent in their class, presenting a "Best Paper Award", "Best Student Paper Award" and "Best Poster Award" for each of the co-located conferences. A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer in a CCIS Series Book.


KANS: Knowledge Discovery Graph Attention Network for Soft Sensing in Multivariate Industrial Processes

Tew, Hwa Hui, Li, Gaoxuan, Ding, Fan, Luo, Xuewen, Loo, Junn Yong, Ting, Chee-Ming, Ding, Ze Yang, Tan, Chee Pin

arXiv.org Artificial Intelligence

Soft sensing of hard-to-measure variables is often crucial in industrial processes. Current practices rely heavily on conventional modeling techniques that show success in improving accuracy. However, they overlook the non-linear nature, dynamics characteristics, and non-Euclidean dependencies between complex process variables. To tackle these challenges, we present a framework known as a Knowledge discovery graph Attention Network for effective Soft sensing (KANS). Unlike the existing deep learning soft sensor models, KANS can discover the intrinsic correlations and irregular relationships between the multivariate industrial processes without a predefined topology. First, an unsupervised graph structure learning method is introduced, incorporating the cosine similarity between different sensor embedding to capture the correlations between sensors. Next, we present a graph attention-based representation learning that can compute the multivariate data parallelly to enhance the model in learning complex sensor nodes and edges. To fully explore KANS, knowledge discovery analysis has also been conducted to demonstrate the interpretability of the model. Experimental results demonstrate that KANS significantly outperforms all the baselines and state-of-the-art methods in soft sensing performance. Furthermore, the analysis shows that KANS can find sensors closely related to different process variables without domain knowledge, significantly improving soft sensing accuracy.


Knowledge Discovery using Unsupervised Cognition

Ibias, Alfredo, Antona, Hector, Ramirez-Miranda, Guillem, Guinovart, Enric

arXiv.org Artificial Intelligence

Knowledge discovery is key to understand and interpret a dataset, as well as to find the underlying relationships between its components. Unsupervised Cognition is a novel unsupervised learning algorithm that focus on modelling the learned data. This paper presents three techniques to perform knowledge discovery over an already trained Unsupervised Cognition model. Specifically, we present a technique for pattern mining, a technique for feature selection based on the previous pattern mining technique, and a technique for dimensionality reduction based on the previous feature selection technique. The final goal is to distinguish between relevant and irrelevant features and use them to build a model from which to extract meaningful patterns. We evaluated our proposals with empirical experiments and found that they overcome the state-of-the-art in knowledge discovery.


Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Theodoropoulos, Christos, Coman, Andrei Catalin, Henderson, James, Moens, Marie-Francine

arXiv.org Artificial Intelligence

The ever-growing volume of biomedical publications creates a critical need for efficient knowledge discovery. In this context, we introduce an open-source end-to-end framework designed to construct knowledge around specific diseases directly from raw text. To facilitate research in disease-related knowledge discovery, we create two annotated datasets focused on Rett syndrome and Alzheimer's disease, enabling the identification of semantic relations between biomedical entities. Extensive benchmarking explores various ways to represent relations and entity representations, offering insights into optimal modeling strategies for semantic relation detection and highlighting language models' competence in knowledge discovery. We also conduct probing experiments using different layer representations and attention scores to explore transformers' ability to capture semantic relations.


A Document-based Knowledge Discovery with Microservices Architecture

Gidey, Habtom Kahsay, Kesseler, Mario, Stangl, Patrick, Hillmann, Peter, Karcher, Andreas

arXiv.org Artificial Intelligence

The first step towards digitalization within organizations lies in digitization - the conversion of analog data into digitally stored data. This basic step is the prerequisite for all following activities like the digitalization of processes or the servitization of products or offerings. However, digitization itself often leads to 'data-rich' but 'knowledge-poor' material. Knowledge discovery and knowledge extraction as approaches try to increase the usefulness of digitized data. In this paper, we point out the key challenges in the context of knowledge discovery and present an approach to addressing these using a microservices architecture. Our solution led to a conceptual design focusing on keyword extraction, similarity calculation of documents, database queries in natural language, and programming language independent provision of the extracted information. In addition, the conceptual design provides referential design guidelines for integrating processes and applications for semi-automatic learning, editing, and visualization of ontologies. The concept also uses a microservices architecture to address non-functional requirements, such as scalability and resilience. The evaluation of the specified requirements is performed using a demonstrator that implements the concept. Furthermore, this modern approach is used in the German patent office in an extended version.


Knowledge Discovery in Surveys using Machine Learning: A Case Study of Women in Entrepreneurship in UAE

Ahmad, Syed Farhan, Hermayen, Amrah, Bhavani, Ganga

arXiv.org Artificial Intelligence

Knowledge Discovery plays a very important role in analyzing data and getting insights from them to drive better business decisions. Entrepreneurship in a Knowledge based economy contributes greatly to the development of a country's economy. In this paper, we analyze surveys that were conducted on women in entrepreneurship in UAE. Relevant insights are extracted from the data that can help us to better understand the current landscape of women in entrepreneurship and predict the future as well. The features are analyzed using machine learning to drive better business decisions in the future.